fix: remove misdirected SemanticMsg enqueue causing O(n²) reprocessing by chethanuk · Pull Request #776 · volcengine/OpenViking

chethanuk · 2026-03-19T11:22:56Z

Summary

Removed misdirected SemanticMsg enqueue in session.py (lines 309-324): After _flush_semantic_operations() already enqueued correct messages with change-tracking dicts, commit_async() was enqueuing a second SemanticMsg targeting the session URI (not a memory directory) with changes=None
Hardened cache loading in semantic_processor.py: _process_memory_directory() now always loads cached summaries from .overview.md regardless of msg.changes value (defense-in-depth against other changes=None callers like lock_manager.py and summarizer.py)
Added 2 regression tests verifying both the enqueue removal and the cache guard

Problem

On every commit, session.py:309-324 enqueued a SemanticMsg with changes=None after the compressor had already correctly enqueued messages with proper change-tracking dicts. This misdirected message bypassed the cache guard in _process_memory_directory() (which only loaded cached summaries when msg.changes was truthy), causing unconditional VLM API calls for every memory file — O(n²) reprocessing. At 500 memories this wasted 100K+ tokens/day.

The timeline on each commit

commit_async() starts
→ compressor.extract_long_term_memories()
→ compressor._flush_semantic_operations()
→ enqueues SemanticMsg(uri=memory_dir, changes={added: [...]}) ✓ CORRECT
→ session.py lines 309-324
→ enqueues SemanticMsg(uri=session_dir, changes=None) ✗ MISDIRECTED
Processor handles msg from step 4: loads cache, only regenerates changed files
Processor handles msg from step 6: no cache, VLM call for EVERY file

Step 6-8 is waste. Removing lines 309-324 eliminates step 6, so step 8 never happens.

Fix

session.py: Removed 16-line block that enqueued the misdirected SemanticMsg — the compressor's _flush_semantic_operations() already handles this correctly
semantic_processor.py: Removed if msg.changes: guard on cache loading so .overview.md is always consulted
Closes Memory extraction triggers O(n²) semantic reprocessing — token cost grows quadratically with memory count #505

Testing

2 regression tests added (test_fix_505_duplicate_semantic_enqueue.py)
- test_no_misdirected_semantic_enqueue_after_flush — verifies commit_async() doesn't enqueue after compressor flush
- test_process_memory_directory_loads_cache_when_changes_none — verifies cache is loaded even when msg.changes is None
Both tests confirmed RED before fix, GREEN after fix (TDD cycle)
Full test suite: no regressions (pre-existing config errors unrelated)

Related Issue

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test update

Changes Made

Testing

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have tested this on the following platforms:
- Linux
- macOS
- Windows

Checklist

My code follows the project's coding style
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

volcengine#505) On every commit, session.py enqueued a SemanticMsg targeting the session URI (not a memory directory) with changes=None after the compressor had already enqueued correct messages with change-tracking dicts. This misdirected message bypassed the cache guard in _process_memory_directory() (which only loaded cached summaries when msg.changes was truthy), causing unconditional VLM API calls for every memory file — O(n²) reprocessing. Fix: - Remove the misdirected enqueue block in session.py (lines 309-324) - Harden cache loading in semantic_processor.py to always consult cached summaries regardless of msg.changes value (defense-in-depth) Impact: ~98.7% token reduction at 500 memories (100K+ tokens/day saved)

Cherry-picked from upstream PR volcengine#776: 1. Remove duplicate SemanticMsg enqueue in session.py commit_async(). After _flush_semantic_operations() already enqueues messages with proper change-tracking dicts, commit_async() was enqueueing a second SemanticMsg with changes=None targeting the session URI. This bypassed the cache guard in _process_memory_directory(), causing full O(n²) directory reprocessing on every commit. 2. Harden _process_memory_directory() to always load cached summaries from .overview.md, regardless of msg.changes value. Defense-in-depth against any caller that sends changes=None. Observed impact: 1 simple chat triggered 49+ LLM calls (vs expected ~10) because every semantic job re-summarized ALL files in entities directory instead of only changed ones. Ref: volcengine#776 Ref: volcengine#769 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

myysy · 2026-03-20T10:43:02Z

Thanks for the PR. The final semantic step in commit_async targets the session path, which is different from the memory path flushed by the compressor, so it can’t be removed or skipped. We’ll revisit and adjust the session-path semantic logic later.

chethanuk · 2026-03-20T14:51:29Z

The final semantic step in commit_async targets the session path, which is different from the memory path flushed by the compressor, so it can’t be removed or skipped. We’ll revisit and adjust the session-path semantic logic later

May be you can explain what need to be done and bit more context so I can understand in detail and work on it?

github-project-automation bot added this to OpenViking project Mar 19, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 19, 2026

MaojiaSheng approved these changes Mar 19, 2026

View reviewed changes

qin-ctx assigned myysy Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove misdirected SemanticMsg enqueue causing O(n²) reprocessing#776

fix: remove misdirected SemanticMsg enqueue causing O(n²) reprocessing#776
chethanuk wants to merge 1 commit intovolcengine:mainfrom
chethanuk:fix/505-quadratic-semantic-reprocessing

chethanuk commented Mar 19, 2026

Uh oh!

myysy commented Mar 20, 2026

Uh oh!

chethanuk commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chethanuk commented Mar 19, 2026

Summary

Problem

Fix

Testing

Related Issue

Type of Change

Changes Made

Testing

Checklist

Screenshots (if applicable)

Additional Notes

Uh oh!

myysy commented Mar 20, 2026

Uh oh!

chethanuk commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants